Search CORE

49 research outputs found

D5.3 Overview of Online Tutorials and Instruction Manuals

Author: Costa Rute
Tasovac Toma
Tiberius Carole
Publication venue: ELEXIS - European Lexicographic Infrastructure
Publication date: 01/01/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020The ELEXIS Curriculum is an integrated set of training materials which contextualizes ELEXIS tools and services inside a broader, systematic pedagogic narrative. This means that the goal of the ELEXIS Curriculum is not simply to inform users about the functionalities of particular tools and services developed within the project, but to show how such tools and services are a) embedded in both lexicographic theory and practice; and b) representative of and contributing to the development of digital skills among lexicographers. The scope and rationale of the curriculum are described in more detail in the Deliverable D5.2 Guidelines for Producing ELEXIS Tutorials and Instruction Manuals. The goal of this deliverable, as stated in the project DOW, is to provide “a clear, structured overview of tutorials and instruction manuals developed within the project.”publishersversionpublishe

Repositório da Universidade Nova de Lisboa

A case study

Author: Costa Rute
Salgado Ana
Tasovac Toma
Publication venue
Publication date: 01/08/2020
Field of study

UIDB/03213/2020 UIDP/03213/2020The modelling and encoding of polylexical units, i.e. recurrent sequences of lexemes that are perceived as independent lexical units, is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI), a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper, we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0, an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as –like constructs, whereas the latter becomes –like constructs, which can have further constraints imposed on them (sense numbers, domain labels, grammatical labels etc.). We codify the use of attributes on to encode different kinds of labels for polylexicals (implicit, explicit and normalised), concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.publishersversionpublishe

Directory of Open Access Journals

Repositório da Universidade Nova de Lisboa

Journals of Faculty of Arts, University of Ljubljana

terms and their domains

Author: Costa Rute
Salgado Ana de Castro
Tasovac Toma
Publication venue: IDS-Verlag ·
Publication date: 01/08/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020 PTDC/LLT-LIN/6841/2020Applying terminological methods to lexicography helps lexicographers deal with the terms occurring in general language dictionaries, especially when it comes to writing the definitions of concepts belonging to special fields. In the context of the lexicographic work of the Dicionário da Língua Portuguesa, an updated digital version of the last Academia das Ciências de Lisboa’ dictionary published in 2001, we have assumed that terminology – in its dual dimension, both linguistic and conceptual – and lexicography are complementary in their methodological approaches. Both disciplines deal with lexical items, which can be lexical units or terms. In this paper, we apply terminological methods to improve the treatment of terms in general language dictionaries and to write definitions as a form of achieving more precision and accuracy, and also to specify the domains to which they belong. Additionally, we highlight the consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are term identification markers instead of a flat list of domains. The need to create and make available structured, organised and interoperable lexicographic resources has led us to follow a path in which the application of standards and best practices of treating and representing specialised lexicographic content are fundamental requirements.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Multiple Access Paths for Digital Collections of Lexicographic Paper Slips

Author: Petrović Snežana
Tasovac Toma
Publication venue: Brighton: Lexical Computing
Publication date: 01/01/2015
Field of study

The paper describes the process of digitizing and annotating some 23,000 lexicographic paper slips compiled by the amateur lexicographer Dimitrije Čemerikić (1882-1960) to document the Serbian dialect from the historic city of Prizren. This previously unpublished dictionary of the Prizren dialect is an important resource not only for dialectologists and linguists, but also for ethnolinguists and ethnologists who are interested in various aspects of popular culture and urban life in the city of Prizren. The alphabetic arrangement of the macrostructure, however, is not conducive to exploratory searches: if users want to find out which dialect word corresponds to a standard Serbian word, or explore a certain type of vocabulary, they need access paths to the dictionary content that go beyond the indexing of the macrostructure. The paper describes an elaborate annotation strategy based on marking up headwords with standardized orthographic alternatives, providing lexical equivalents and assigning semantic fields to entries in order to achieve robust navigability and searchability of the collection without full-text transcription and/or structural data modeling

Serbian Academy of Science and Arts Digital Archive (DAIS)

The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case

Author: Bowers Jack
Khan Fahad
Khemakhen Mohamed
Romary Laurent
Salgado Ana
Tasovac Toma
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

UIDB/00749/2020 UIDP/00749/2020In this article, we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework(LMF) ISO standard, namely Part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, andPart 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the useof both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of thereference Portuguese dictionaryGrande Dicion ́ario Houaiss da L ́ıngua Portuguesa, part of a broader experiment comprisingthe analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the UnifiedModelling Language (UML) and also in a couple of cases in TEI.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Ontologie des marques de domaines appliquée aux dictionnaires de langue générale

Author: Carvalho Sara
Costa Rute
Salgado Ana
Simões Alberto
Tasovac Toma
Publication venue
Publication date: 01/01/2020
Field of study

Dans cet article, nous présentons OntoDomLab-Med, une ontologie des marques de domaines des sciences médicales et de la santé. Nous avons élaboré une taxonomie à partir des marques présentes dans la liste des abréviations du Dicionário da Língua Portuguesa Contemporânea de l'Académie des Sciences de Lisbonne. Notre objectif est de mettre en rapport OntoDomLab-Med et les entrées sélectionnées du dictionnaire balisées en TEI Lex-0 - système de balisage plus stricte et plus adapté que TEI au codage des dictionnaires - en ligne avec les principes FAIR. L'ontologie construite avec Protégé et codifiée en OWL permet l'exportation des connaissances dans un format d'échange interopérable permettant que l'ontologie puisse être appliquée à différentes ressources lexicales pour référencer les domaines indépendamment de la langue utilisée. OntoDomLab-Med sera utile non seulement pour rechercher de l'information par domaine, mais permettra au lexicographe d'être plus cohérent dans son travail de chercheur et de codeur de l'information à des fins lexicographiques.En este artículo, presentamos OntoDomLab-Med, una ontología de marcas de dominios de las ciencias médicas y de la salud. Hemos desarrollado una taxonomía a partir de las marcas presentes en la lista de abreviaturas del Dicionário da Língua Portuguesa Contemporânea de la Academia de Ciencias de Lisboa. Nuestro objetivo es vincular OntoDomLab-Med y las entradas seleccionadas del diccionario etiquetadas en TEI Lex-0, un sistema de marcado más estricto y más adecuado que la TEI para la codificación del diccionario, en línea con los principios FAIR. La ontología construida con Protégé y codificada en OWL permite la exportación de conocimiento en un formato de intercambio interoperable que permite que la ontología se aplique a diferentes recursos léxicos para hacer referencia a dominios independientemente del idioma utilizado. OntoDomLab-Med será útil no solo para buscar información por dominio, sino que permitirá al lexicógrafo ser más consistente en su trabajo como investigador y codificador de información con fines lexicográficos.In this article, we present OntoDomLab-Med, a domain label ontology focused on medical and health sciences. We have developed a taxonomy from the labels included in the list of abbreviations of the Dicionário da Língua Portuguesa Contemporânea of the Lisbon Academy of Sciences. Our goal is to connect OntoDomLab-Med to a set of selected entries from the dictionary which have been encoded in TEI Lex-0, a stricter and more suitable format than TEI for dictionary encoding, in line with the FAIR principles. The ontology, built with Protégé and represented in OWL, allows the knowledge to be exported using an interoperable exchange format, thereby enabling the ontology to be applied to different lexical resources and to various domains, regardless of the natural language being used. OntoDomLab-Med will be useful not only to research information by domain but it will also allow the lexicographer to be more consistent in his/her work as both researcher and coder of information for lexicographic purposes

Diposit Digital de Documents de la UAB

TEI Lex-0 Etym – towards terse recommendations for the encoding of etymological information

Author: Bowers Jack
Herold Axel
Romary Laurent
Tasovac Toma
Publication venue: HAL CCSD
Publication date: 13/01/2021
Field of study

The present paper describes the etymological component of the TEI Lex-0 initiative which aims at defining a terser subset of the TEI guidelines for the representation of etymological features in dictionary entries. Going beyond the basic provision of etymological mechanisms in the TEI guidelines, TEI Lex-0 Etym proposes a systematic representation of etymological and cognate descriptions by means of embedded constructs based on the (for etymologies) and (for etymons and cognates) elements. In particular, given that all the potential contents of etymons are highly analogous to those of dictionary entries in general, the contents presented herein heavily re-use many of the corresponding features and constraints introduced in other components of the TEI Lex-0 to the encoding of etymologies and etymons. The TEI Lex-0 Etym model is also closely aligned to ISO 24613-3 on modelling etymological data and the corresponding TEI serialisation available in ISO 24613-4

INRIA a CCSD electronic archive server

Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case

Author: Bowers Jack
Khan Fahad
Khemakhem Mohamed
Romary Laurent
Salgado Ana
Tasovac Toma
Publication venue: HAL CCSD
Publication date: 11/05/2020
Field of study

Due to COVID19 pandemic, the 12th edition is cancelled. The LREC 2020 Proceedings are available at http://www.lrec-conf.org/proceedings/lrec2020/index.htmlInternational audienceIn this article, we will introduce two of the new parts of the new multi-part version of the Lexical Markup Framework (LMF) ISO standard, namely Part 3 of the standard (ISO 24613-3), which deals with etymological and diachronic data, and Part 4 (ISO 24613-4), which consists of a TEI serialisation of all of the prior parts of the model. We will demonstrate the use of both standards by describing the LMF encoding of a small number of examples taken from a sample conversion of the reference Portuguese dictionary Grande Dicionário Houaiss da Língua Portuguesa, part of a broader experiment comprising the analysis of different, heterogeneously encoded, Portuguese lexical resources. We present the examples in the Unified Modelling Language (UML) and also in a couple of cases in TEI

INRIA a CCSD electronic archive server

Interlinking Lexicographic Data in the MORDigital Project

Author: Almeida Bruno
Carvalho Sara
Costa Rute
Khan Fahad
Khemakhem Mohamed
Ramos Margarida
Romary Laurent
Salgado Ana Castro
Silva Raquel
Tasovac Toma
Publication venue
Publication date: 01/09/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020publishersversionpublishe

Repositório da Universidade Nova de Lisboa

The advent of a new lexicographical portuguese project

Author: Almeida Bruno
Carvalho Sara
Costa Rute
Khan Anas
Khemakhem Mohamed
Ramos Margarida
Romary Laurent
Salgado Ana de Castro
Silva Raquel
Tasovac Toma
Publication venue
Publication date: 01/07/2021
Field of study

UID/LIN/03213/2013MORDigital is a newly funded Portuguese lexicographic project that aims to produce high-quality and searchable digital versions of the first three editions (1789; 1813; 1823) of the Diccionario da Lingua Portugueza by António de Morais Silva, preserving and making accessible this important work of European heritage. This paper will describe the current state of the art, the project, its objectives and the methodology proposed, the latter of which is based on a rigorous linguistic analysis and will also include steps necessary for the ontologisation of knowledge contained in and relating to the text. A section will be dedicated to the various investigation domains of the project description. The output of the project will be made available via a dedicated platform.publishersversionpublishe

Repositório da Universidade Nova de Lisboa